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The male-to-female sex ratio at birth is constant across world populations with an average of 1.06 (106 male to 
100 female live births) for populations of European descent. The sex ratio is considered to be affected by numer- 
ous biological and environmental factors and to have a heritable component. The aim of this study was to in- 
vestigate the presence of common allele modest effects at autosomal and chromosome X variants that could 
explain the observed sex ratio at birth. We conducted a large-scale genome-wide association scan (GWAS) 
meta-analysis across 51 studies, comprising overall 114 863 individuals (61 094 women and 53 769 men) of 
European ancestry and 2 623 828 common (minor allele frequency >0.05) single-nucleotide polymorphisms 
(SNPs). Allele frequencies were compared between men and women for directly-typed and imputed variants 
within each study. Forward-time simulations for unlinked, neutral, autosomal, common loci were performed 
under the demographic model for European populations with a fixed sex ratio and a random mating scheme 
to assess the probability of detecting significant allele frequency differences. We do not detect any genome- 
wide significant (P<5 x 10 8 ) common SNP differences between men and women in this well-powered 
meta-analysis. The simulated data provided results entirely consistent with these findings. This large-scale in- 
vestigation across ~ 115 000 individuals shows no detectable contribution from common genetic variants to 
the observed skew in the sex ratio. The absence of sex-specific differences is useful in guiding genetic associ- 
ation study design, for example when using mixed controls for sex-biased traits. 



INTRODUCTION 

The male-to-female sex ratio at birth is very constant across 
world populations, ranging between 1.02 and 1.08 (102-108 



male to 100 female live births), with an average of 1.06 for 
populations of European descent (1-3). The sex ratio at 
birth is mainly determined by factors influencing the 
primary sex ratio, which is the sex ratio at conception, and 
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those influencing the survival of the embryo (4). Frequently 
reported primary sex ratio-determining factors include motility 
and survival time of X-bearing and Y-bearing sperm. A 
proportion of prenatal mortality can be attributable to im- 
munological interaction between mother and embryo (4). 
Interestingly, more males are being born in spite of the fact 
that there is higher mortality of males than females during 
intrauterine life (4,5). In addition, the sex ratio is considered 
to be affected by numerous other biological (endogenous) 
and environmental (exogenous) factors, albeit their influence 
is generally thought to be of a small effect (1,6). These 
factors include gonadotropins and/or testosterone concentra- 
tion at the time of conception, ovulation induction, parental 
age, parity, birth order, coital rates, infertility, parental 
illness, maternal malnutrition, smoking, exposure to certain 
chemicals, stress, war, socioeconomic status and many 
others (1,6-8). The variation in sex ratio was also observed 
in many animal and plant species (9). Studies of parasitoid 
wasps, particularly Nasonia vitropennis, identified several 
quantitative trait loci (QTL) associated with the sex ratio, 
pointing to a genetic contribution (9). In addition, many 
authors suggest that the human sex ratio also has a heritable 
component. Paternal effects have been proposed to play a 
role in the sex ratio, for example, men with more brothers 
tend to have more sons whereas men with more sisters tend 
to have more daughters (5,10,1 1). Based on population genet- 
ics modelling, Gellatly et al. (11) suggested that the sex ratio 
is determined by common inheritance of polymorphic auto- 
somal genes that exert their effect through the male reproduct- 
ive system. Another study of reproductive fitness in the 
Hutterite population suggested that genetic variants, both auto- 
somal and X-linked, influence natural fertility in humans (12). 
Research of human births in two-child families observed that 
sexes of offspring do not follow a binomial model of inherit- 
ance where probability of having a boy equals probability of 
having a girl (13). This study also pointed to the lack of inde- 
pendence among sexes of children of the same parents (13). A 
couple of possible genetic mechanisms underlying this obser- 
vation such as Y- and X-linked immunological incompatibil- 
ities between mother and embryo have been proposed (13,14). 

In the present study, we test whether common variant genetic 
effects partly underlie the observed male-to-female sex ratio at 
birth. To address this, we investigate the presence of autosomal 
and chromosome X variant differences between men and 
women across 114 863 individuals through large-scale genome- 
wide association study (GWAS) meta-analysis. We also 
conduct a forward-time simulation study to assess the probabil- 
ity of observing significant allele frequency differences at auto- 
somal markers between men and women. Our study has high 
power to detect loci with modest to small effect sizes. 



RESULTS 

GWAS meta-analysis results 

Initial meta-analysis results pointed to an excess of associa- 
tions compared with the null distribution (Fig. 1A). We exam- 
ined all genome-wide significant SNPs with effective sample 
size > 10 000 to check for false positives due to genotyping 
error or other artefacts. We investigated three main diagnostic 
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Figure 1. QQ plots for 2 623 828 directly genotyped and imputed SNPs: (A) 
for all examined SNPs; (B) after exclusion of poorly genotyped/called SNPs. 

metrics: poor cluster plots in men or women (Supplementary 
Material, Fig. SI), sequence similarity on chromosome Y 
and exact Hardy- Weinberg equilibrium (HWE), P< 1.0 x 
10~ for men or women. Autosomal SNPs that lie in 
genomic regions that have sequence similarity on the Y 
chromosome may be incorrectly genotyped/called in men, 
but not in women, which may give rise to false-positive asso- 
ciations. This can be traced through several quality control 
(QC) checks in men: excess of heterozygosity, deviation 
from HWE and poor cluster plots in men and not in women. 
Some/all of these factors were observed for SNPs designated 
for exclusion from follow-up (all with highly significant asso- 
ciation f-values). We excluded SNPs from the pseudoautoso- 
mal chromosome X boundary regions (within 55 kb on the 
short arm of chromosome X and 115 kb on the long arm 
that lie next to the non-pseudoautosomal regions) to guard 
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against genotyping error in men (for example, caused by the 
presence of truncated copies of genes and/or mappings to mul- 
tiple places across the genome). After exclusion of all poorly 
genotyped/called SNPs, we detect a single genome-wide sig- 
nificant association at a non-pseudoautosomal chromosome 
X variant (rsl2689384, P = 2.66 x 10" 13 ), which is an intron- 
ic variant within RBMX2. As this SNP was imputed in all 
studies driving the association, we checked cluster plots of 
all directly typed variants 500 kb upstream and downstream 
of the associated SNP in all studies, excluded SNPs with 
poor clustering, re-imputed the region and re-run the 
meta-analysis. The significance of rs 126893 84 dropped by 
five orders of magnitude (allele G, OR =1.18, 95% CI 
[1.11-1.25], P = 3.27 x 10" 8 ) but remained nominally 
genome-wide significant. However, there are several factors 
that reduce the credibility of this finding. First, 12 studies in 
total contributed summary statistics for this variant (for a 
total of 33 259 individuals), out of which six WTCCC1 
studies drive the association (Supplementary Material, 
Table SI). This SNP is imputed in all WTCCC1 studies. Sec- 
ondly, as shown in the regional association plot (Supplemen- 
tary Material, Fig. S2), this SNP lacks support for 
association from neighbouring variants. The next most statis- 
tically strong association in the region is modest (P = 
1.99 x 10" 4 ) and observed at an SNP (rs2294956) which is 
in perfect linkage disequilibrium with rsl2689384 (r 2 = 1, 
D' = 1) based on HapMap CEU, on which imputation was 
based. This second SNP (rs2294956) is also imputed in the 
WTCCC1 studies, but in fact the meta-analysis includes data 
from over twice the sample size (31 studies, 67 162 indivi- 
duals, Supplementary Material, Table SI). All studies contrib- 
uting directly typed data for this variant (46 066 individuals 
with directly typed data) show no evidence for association, 
indicating that the signal observed at the imputed variant 
may be an artefact. We have therefore not considered this 
single associated SNP any further. 

The distribution of association P-values after meta-analysis 
QC was consistent with the null (Fig. IB). Our study has 80% 
power to detect OR of 1.13 (at a = 5 x 10" 8 ) for SNPs with 
minor allele frequency (MAF) >5%, assuming an additive 
model. We did not further examine SNPs with P-values 
above the genome-wide significance threshold since our 
study had sufficient power to detect associations of loci with 
small-to-modest effect sizes. 



Simulation study results 

Association analysis of 1 337 699 autosomal common and 135 
988 autosomal low-frequency variants in the simulated case- 
control set matching the empirical study did not identify any 
differences in allele frequencies between men and women 
(a = 5 x 10~ 8 ). Quantile-quantile (QQ) plots for simulated 
common and low-frequency variants are shown in Supplemen- 
tary Material, Figure S3. 

DISCUSSION 

This large-scale investigation across 114 863 individuals iden- 
tified no detectable contribution from common genetic 



variants to the observed skew in sex ratio at birth. This 
study combined the data from 51 cohorts and has excellent 
power to detect small-to-modest effect sizes at common loci. 
The sample sizes contributing to the analysis of chromosome 
X SNPs were lower due to limited overlap of directly typed 
SNPs across platforms (in the absence of imputed data 
across all studies). However, power remains high at over 
80% to detect small-to-modest effect sizes. From the pheno- 
typic aspect, sex is a well-characterized trait representing an 
additional strength of this study, which is unlikely to suffer 
from phenotype misclassification. 

Our results, within the power constraints of our study, indi- 
cate that sex-specific selection against particular autosomal 
genetic variants is not a plausible explanation for the observed 
male-to-female sex ratio at birth and argue against the hypoth- 
esis that incompatible genotypes at common variants between 
the autosomes and sex chromosomes could lead to miscar- 
riage, thus generating sex-specific genetic differences. We per- 
formed forward-time simulations of 1.3 million independent 
autosomal, common, neutral loci, conditioning on the 
male-to-female sex ratio at birth, in a cohort matching the ori- 
ginal study sample. The lack of any significant allele fre- 
quency differences between men and women was in keeping 
with the findings of the GWAS meta-analysis for autosomal 
SNPs. We also tested the effects of low-frequency variants 
in the simulated data, and found no evidence for association 
with the observed sex ratio at birth. However, we cannot 
rule out the effects of rare, structural or chromosome Y var- 
iants since these were not analysed in our study. 

Sex chromosome loci may be relevant for the sex ratio 
determination due to their expression in the reproductive 
system, their role in spermatogenesis, sperm morphogenesis 
and movement and male-female fertility in general (15,16). 
Therefore, we performed a comprehensive chromosome X ana- 
lysis involving two main chromosome X regions: pseudoautoso- 
mal and non-pseudoautosomal. There are two pseudoautosomal 
regions (PARI and PAR2), which are homologous on X and Y 
chromosomes, and for which men and women carry two 
alleles per SNP, whereas for the non-pseudoautosomal region 
men carry only one allele per SNP. We investigated allele 
frequency differences between men and women in both chromo- 
some X regions and we observed the association of one non- 
pseudoautosomal SNP (rsl2689384)just below the genome-wide 
significance level. For various reasons expanded in the Results 
section, we believe that this variant may be an imputation artefact 
and have thus not taken it forward to further studies. 

The investigated dataset consisted of more women (61 094) 
than men (53 769). Our meta-analysis incorporated summary 
statistics deriving from 51 collaborating studies and the vast 
majority of these studies (36 studies) are population based. 
The main difference in the sex ratio is driven by these popu- 
lation based studies and the reasons for having fewer men 
can be heterogeneous and study specific. Most likely the 
main reasons are the generally recognized lower male re- 
sponse to take part in epidemiological population-based 
studies (17) and/or sex differences in longevity where 
women have a higher expected lifespan (18). Fifteen of the 
51 contributing studies are disease- rather than population- 
based and the sex ratio in these studies approximately corre- 
sponds to the disease sex ratio in the population. We were 
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driven by the rationale that the sex ratio at birth is constant 
throughout time and across all world populations, meaning 
that common variants are more likely to underlie the observed 
sex ratio at birth. Therefore, in the case of a higher male death 
rate, we would still have enough power to detect common 
variant differences due to a very large sample set. However, 
there are scenarios where this sampling difference between 
men and women might cause bias, for example there may be 
a genetic variant that is influencing both the sex ratio and lon- 
gevity in men. In that case, higher male death rate would cause 
the removal of this specific genetic variant, thus masking the 
signal. 

Our results have important implications for genetic associ- 
ation study design, for example regarding the selection of 
control sets for sex-biased traits such as prostate cancer in 
men or anorexia nervosa in women. The use of single-sex con- 
trols for sex-specific diseases generally decreases the sample 
size and power of a study. Our findings demonstrate that 
mixed sex controls can be used as an appropriate set in 
studies of sex-specific traits, when focusing on common loci. 
As one additional implication for genetic association study 
analyses, our study stresses the importance of careful pre- 
and post-analysis QC. QQ plots of our initial meta-analysis 
results showed high deviation from the null, yet, after QC 
we observe no inflation of signal. A robust and thorough QC 
pipeline is necessary to verify any positive association 
signals, especially in meta-analyses where many studies con- 
tribute data that were genotyped (and phenotyped) in many 
different settings. 

We conclude that common genetic variants do not play a 
role in defining male-to-female sex ratio at birth. In this 
large-scale meta-analysis of ~ 115 000 individuals, we found 
no allele frequency differences at common loci between men 
and women. Simulated data of autosomal neutral variants 
support these findings. Our results can be useful in informing 
GWAS study design, especially when using mixed controls for 
sex-biased traits. 

MATERIALS AND METHODS 

Study samples 

We conducted genome-wide meta-analysis across 51 studies, 
comprising overall 114 863 individuals (61 094 women and 
53 769 men) of European ancestry. The characteristics of 
samples from contributing studies are presented in Supple- 
mentary Material, Table S2. 

Ethics statement 

Each study obtained ethical approval from their respective 
research ethics committee and all participants gave signed 
informed consent in accordance with the Declaration of Helsinki. 

Genotyping, imputation and QC 

All samples were genotyped using commercially available 
Illumina (Illumina, Inc., San Diego, CA, USA) or Affymetrix 
(Affymetrix, Inc., Santa Clara, CA, USA) platforms. Imput- 
ation of missing genotypes was based on HapMap Phase II 



genotypes for the European population (CEU). QC of directly 
typed and imputed variants was conducted separately in each 
study. Study-specific information on genotyping platforms, 
imputation methods and QC metrics is presented in Supple- 
mentary Material, Table S3. QC checks included tests for 
relatedness among samples within individual studies. 

Genome-wide association analysis of autosomal variants 

Case-control association analysis of autosomal SNPs was 
conducted under the additive model, for directly typed and 
imputed variants, within each study. Women were coded as 
cases and men as controls. Association analyses of imputed 
variants took genotype uncertainty into account, with the ex- 
ception of the QIMR study which conducted analysis on best- 
guess genotypes. Where necessary, the first three genotype- 
based principal components were used as covariates. Studies 
with related individuals additionally adjusted analyses for 
family relatedness using linear mixed models. Study-specific 
association analysis software is presented in Supplementary 
Material, Table S3. 

Chromosome X analysis 

Each contributing study performed two separate chromosome 
X analyses, including pseudoautosomal and non- 
pseudoautosomal regions. Association analyses were per- 
formed, as per autosomes, under the additive model. Overall, 
42 studies performed analysis of pseudoautosomal region, 1 1 
of these imputed data using HapMap Phase II, all others 
used directly typed variants only. For non-pseudoautosomal 
region, 46 studies performed association analysis, 12 of 
these used HapMap Phase II imputed data whereas others 
used directly typed variants only. Study-specific chromosome 
X imputation/association analysis software is presented in 
Supplementary Material, Table S3. 

GWAS meta-analysis 

We performed fixed and random effects meta-analysis to syn- 
thesize summary statistics results across contributing studies 
to identify autosomal and chromosome X common SNP differ- 
ences between men and women. For meta-analysis purposes, 
we used GWAMA (19). Prior to meta-analysis, we excluded 
SNPs with MAF lower than 0.05 and SNPs with low imput- 
ation accuracy scores. Specifically, we used a cut-off of 
rsq_hat < 0.3 for genotypes imputed with MACH (20), 
BEAGLE (21) and PLINK (22) software and a cut-off of 
proper info score <0.5 for IMPUTE (23) software. Overall, 
2 623 828 directly genotyped and imputed SNPs passed QC 
criteria and were included in the meta-analysis. The genomic 
control (GC) inflation factor (lambda) was calculated and 
applied to correct the results for each study separately prior 
to the meta-analysis. The meta-analysis results were also cor- 
rected for overall lambda GC. The average GC inflation factor 
across studies was 1 .005 for directly genotyped SNPs, 0.97 for 
imputed SNPs and 1.007 overall, suggesting little population 
stratification. To determine the effective number of individuals 
for each study, we calculated effective number of cases 
(N_eff_case) and multiplied it by 2. N_eff_case was derived 
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using the formula N_eff_case = 2 x N_case x N_ctrl/ 
(N_case + N_ctrl), where N_case and N_ctrl is the number 
of cases (women) and controls (men), respectively. We inves- 
tigated evidence of heterogeneity using the I 2 statistic (24). 
Genome-wide significance was set to 5 x 10~ 8 . We created 
QQ plots to visualize meta-analysis association results. The 
power of our study was determined using QUANTO (25). 

Simulation study 

To exclude the possibility that our null results for autosomal 
variants are due to either sampling bias or data quality and 
to examine the probability of having false positives within 
the power constraints of our study, we sought a theoretical cor- 
roboration of our empirical results by conducting association 
analysis in an 'ideal' unbiased simulated dataset. Simulated 
genetic data were produced by means of forward-time simula- 
tion (26-28) under a model of a single population with two 
bottlenecks according to Schaffner et al. (29) with two excep- 
tions: recent exponential growth of population size (instead 
of instantaneous changes) and final effective population size 
of 10 6 (instead of 10 5 ), as this has been shown to be the 
case for the European population (30). Demographic model 
parameters are given in Supplementary Material, Table S4. 
The generation time was assumed to be 25 years, and the mu- 
tation rate per site per generation was 1.5 x 10~ 8 (29). We 
applied a fixed sex ratio and a random mating scheme (i.e. 
parents are randomly selected irrespective of their genotype) 
validated by different genetic and demographic models (27). 
We set a probability of having a male offspring to 0.5122, 
which corresponds to a male-to-female ratio of 1.05. Simula- 
tions were run for 1 7 000 generations after which we randomly 
sampled women and men matching the original study for 
sample size (women = 61 094; men = 53 769). 

We simulated unlinked, neutral, autosomal common var- 
iants with initial MAF of 0.02 in the founder population 
(27,31). The total number of simulated loci was 56 502 900, 
out of which 2.4% were common (MAF > 0.05). We per- 
formed allele-based chi-squared association tests on the 1 
337 699 common loci. This figure matches the estimated 
number of independent SNPs in HapMap CEU samples of 
around 1 million (32). We additionally performed allele-based 
Fishers exact association tests on 135 988 low-frequency var- 
iants (MAF 0.01-0.05). Supplementary Material, Figure S4 
shows the MAF spectrum for simulated data compared with 
the 1000 Genomes Project Pilot 3 CEU (2n = 60) data. 

SUPPLEMENTARY MATERIAL 

Supplementary Material is available at HMG online. 
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